Search CORE

136 research outputs found

Awkward Arrays in Python, C++, and Numba

Author: Elmer Peter
Lange David
Pivarski Jim
Publication venue: 'EDP Sciences'
Publication date: 02/07/2020
Field of study

The Awkward Array library has been an important tool for physics analysis in Python since September 2018. However, some interface and implementation issues have been raised in Awkward Array's first year that argue for a reimplementation in C++ and Numba. We describe those issues, the new architecture, and present some examples of how the new interface will look to users. Of particular importance is the separation of kernel functions from data structure management, which allows a C++ implementation and a Numba implementation to share kernel functions, and the algorithm that transforms record-oriented data into columnar Awkward Arrays.Comment: To be published in CHEP 2019 proceedings, EPJ Web of Conferences; post-review updat

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

Big Data in HEP: A comprehensive use case study

Author: Cremonesi Matteo
Elmer Peter
Gutsche Oliver
Jayatilaka Bo
Kowalkowski Jim
Pivarski Jim
Sehrish Saba
Surez Cristina Mantilla
Svyatkovskiy Alexey
Tran Nhan
Publication venue: 'IOP Publishing'
Publication date: 12/03/2017
Field of study

Experimental Particle Physics has been at the forefront of analyzing the worlds largest datasets for decades. The HEP community was the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems collectively called Big Data technologies have emerged to support the analysis of Petabyte and Exabyte datasets in industry. While the principles of data analysis in HEP have not changed (filtering and transforming experiment-specific data formats), these new technologies use different approaches and promise a fresh look at analysis of very large datasets and could potentially reduce the time-to-physics with increased interactivity. In this talk, we present an active LHC Run 2 analysis, searching for dark matter with the CMS detector, as a testbed for Big Data technologies. We directly compare the traditional NTuple-based analysis with an equivalent analysis using Apache Spark on the Hadoop ecosystem and beyond. In both cases, we start the analysis with the official experiment data formats and produce publication physics plots. We will discuss advantages and disadvantages of each approach and give an outlook on further studies needed.Comment: Proceedings for 22nd International Conference on Computing in High Energy and Nuclear Physics (CHEP 2016

arXiv.org e-Print Archive

CERN Document Server

Analysis Description Languages for the LHC

Author: Gras Philippe
Gray Lindsey
Krikler Benjamin
Pivarski Jim
Prosper Harrison B.
Rizzi Andrea
Sekmen Sezen
Unel Gokhan
Watts Gordon
Publication venue
Publication date: 01/01/2020
Field of study

An analysis description language is a domain specific language capable of describing the contents of an LHC analysis in a standard and unambiguous way, independent of any computing framework. It is designed for use by anyone with an interest in, and knowledge of, LHC physics, i.e., experimentalists, phenomenologists and other enthusiasts. Adopting analysis description languages would bring numerous benefits for the LHC experimental and phenomenological communities ranging from analysis preservation beyond the lifetimes of experiments or analysis software to facilitating the abstraction, design, visualization, validation, combination, reproduction, interpretation and overall communication of the analysis contents. Here, we introduce the analysis description language concept and summarize the current efforts ongoing to develop such languages and tools to use them in LHC analyses.Comment: Accepted contribution to the proceedings of The 8th Annual Conference on Large Hadron Collider Physics, LHCP2020, 25-30 May, 2020, onlin

arXiv.org e-Print Archive

Crossref

Archivio della Ricerca - Università di Pisa

CERN Document Server

The Scikit HEP Project -- overview and prospects

Author: Burr Chris
Das Pratyush
Dembinski Hans
Feickert Matthew
Krikler Benjamin
Marinangeli Matthieu
Nandi Jaydeep
Pivarski Jim
Rodrigues Eduardo
Schreiner Henry
Smirnov Dmitri
Smith Nick
Publication venue: 'EDP Sciences'
Publication date: 01/01/2020
Field of study

Scikit-HEP is a community-driven and community-oriented project with the goal of providing an ecosystem for particle physics data analysis in Python. Scikit-HEP is a toolset of approximately twenty packages and a few "affiliated" packages. It expands the typical Python data analysis tools for particle physicists. Each package focuses on a particular topic, and interacts with other packages in the toolset, where appropriate. Most of the packages are easy to install in many environments; much work has been done this year to provide binary "wheels" on PyPI and conda-forge packages. The Scikit-HEP project has been gaining interest and momentum, by building a user and developer community engaging collaboration across experiments. Some of the packages are being used by other communities, including the astroparticle physics community. An overview of the overall project and toolset will be presented, as well as a vision for development and sustainability.Comment: 6 pages, 3 figures, Proceedings of the 24th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2019), Adelaide, Australia, 4-8 November 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

EDP Sciences OAI-PMH repository (1.2.0)

CERN Document Server

Using Big Data Technologies for HEP Analysis

Author: Bellini Claudio
Bian Bianny
Canali Luca
Cremonesi Matteo
Dimakopoulos Vasileios
Elmer Peter
Evangelos Evangelos
Fisk Ian
Girone Maria
Gutsche Oliver
Hoh Siew-Yan
Jayatilaka Bo
Khristenko Viktor
Luiselli Andrea
Melo Andrew
Olivito Dominick
Pazzini Jacopo
Pivarski Jim
Svyatkovskiy Alexey
Zanetti Marco
Publication venue: 'EDP Sciences'
Publication date: 01/01/2019
Field of study

The HEP community is approaching an era were the excellent performances of the particle accelerators in delivering collision at high rate will force the experiments to record a large amount of information. The growing size of the datasets could potentially become a limiting factor in the capability to produce scientific results timely and efficiently. Recently, new technologies and new approaches have been developed in industry to answer to the necessity to retrieve information as quickly as possible to analyze PB and EB datasets. Providing the scientists with these modern computing tools will lead to rethinking the principles of data analysis in HEP, making the overall scientific process faster and smoother. In this paper, we are presenting the latest developments and the most recent results on the usage of Apache Spark for HEP analysis. The study aims at evaluating the efficiency of the application of the new tools both quantitatively, by measuring the performances, and qualitatively, focusing on the user experience. The first goal is achieved by developing a data reduction facility: working together with CERN Openlab and Intel, CMS replicates a real physics search using Spark-based technologies, with the ambition of reducing 1 PB of public data in 5 hours, collected by the CMS experiment, to 1 TB of data in a format suitable for physics analysis. The second goal is achieved by implementing multiple physics use-cases in Apache Spark using as input preprocessed datasets derived from official CMS data and simulation. By performing different end-analyses up to the publication plots on different hardware, feasibility, usability and portability are compared to the ones of a traditional ROOT-based workflow

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

CERN Document Server

HEP Software Foundation Community White Paper Working Group - Data Analysis and Interpretation

At the heart of experimental high energy physics (HEP) is the development of facilities and instrumentation that provide sensitivity to new phenomena. Our understanding of nature at its most fundamental level is advanced through the analysis and interpretation of data from sophisticated detectors in HEP experiments. The goal of data analysis systems is to realize the maximum possible scientific potential of the data within the constraints of computing and human resources in the least time. To achieve this goal, future analysis systems should empower physicists to access the data with a high level of interactivity, reproducibility and throughput capability. As part of the HEP Software Foundation Community White Paper process, a working group on Data Analysis and Interpretation was formed to assess the challenges and opportunities in HEP data analysis and develop a roadmap for activities in this area over the next decade. In this report, the key findings and recommendations of the Data Analysis and Interpretation Working Group are presented.Comment: arXiv admin note: text overlap with arXiv:1712.0659

arXiv.org e-Print Archive

CERN Document Server

Explore Bristol Research

A Roadmap for HEP Software and Computing R&D for the 2020s

Author: Albrecht Johannes
Alves Antonio Augusto, Jr
Amadio Guilherme
Andronico Giuseppe
Anh-Ky Nguyen
Aphecetche Laurent
Apostolakis John
Asai Makoto
Atzori Luca
Babik Marian
Bagliesi Giuseppe
Bandieramonte Marilena
Banerjee Sunanda
Barisits Martin
Bauerdick Lothar A. T.
Belforte Stefano
Benjamin Douglas
Bernius Catrin
Bhimji Wahid
Bianchi Riccardo Maria
Bird Ian
Biscarat Catherine
Blomer Jakob
Bloom Kenneth
Boccali Tommaso
Bockelman Brian
Bold Tomasz
Bonacorsi Daniele
Boveia Antonio
Bozzi Concezio
Bracko Marko
Britton David
Buckley Andy
Buncic Predrag
Calafiura Paolo
Campana Simone
Canal Philippe
Canali Luca
Carlino Gianpaolo
Castro Nuno
Cattaneo Marco
Cerminara Gianluca
Cervantes Villanueva Javier
Chang Philip
Chapman John
Chen Gang
Childers Taylor
Clarke Peter
Clemencic Marco
Cogneras Eric
Coles Jeremy
Collier Ian
Colling David
Corti Gloria
Cosmo Gabriele
Costanzo Davide
Couturier Ben
Cranmer Kyle
Cranshaw Jack
Cristella Leonardo
Crooks David
Crépé-Renaudin Sabine
Currie Robert
Dallmeier-Tiessen Sünje
De Cian Michel
De Roeck Albert
De Kaushik
Delgado Peris Antonio
Derue Frédéric
Di Girolamo Alessandro
Di Guida Salvatore
Dimitrov Gancho
Doglioni Caterina
Dotti Andrea
Duellmann Dirk
Duflot Laurent
Dykstra Dave
Dziedziniewicz-Wojcik Katarzyna
Dziurda Agnieszka
Egede Ulrik
Elmer Peter
Elmsheuser Johannes
Elvira V. Daniel
Eulisse Giulio
Farrell Steven
Ferber Torben
Filipcic Andrej
Fisk Ian
Fitzpatrick Conor
Flix José
Formica Andrea
Forti Alessandra
Foundation HEP Software
Franzoni Giovanni
Frost James
Fuess Stu
Gaede Frank
Ganis Gerardo
Gardner Robert
Garonne Vincent
Gellrich Andreas
Genser Krzysztof
George Simon
Geurts Frank
Gheata Andrei
Gheata Mihaela
Giacomini Francesco
Giagu Stefano
Giffels Manuel
Gingrich Douglas
Girone Maria
Gligorov Vladimir V.
Glushkov Ivan
Gohn Wesley
Gonzalez Lopez Jose Benito
González Caballero Isidro
González Fernández Juan R.
Govi Giacomo
Grandi Claudio
Grasland Hadrien
Gray Heather
Grillo Lucia
Guan Wen
Gutsche Oliver
Gyurjyan Vardan
Hanushevsky Andrew
Hariri Farah
Hartmann Thomas
Harvey John
Hauth Thomas
Hegner Benedikt
Heinemann Beate
Heinrich Lukas
Heiss Andreas
Hernández José M.
Hildreth Michael
Hodgkinson Mark
Hoeche Stefan
Holzman Burt
Hristov Peter
Huang Xingtao
Ivanchenko Vladimir N.
Ivanov Todor
Iven Jan
Jashal Brij
Jayatilaka Bodhitha
Jones Roger
Jouvin Michel
Jun Soon Yung
Kagan Michael
Kalderon Charles William
Kane Meghan
Karavakis Edward
Katz Daniel S.
Kcira Dorian
Keeble Oliver
Kersevan Borut Paul
Kirby Michael
Klimentov Alexei
Klute Markus
Komarov Ilya
Konstantinov Dmitri
Koppenburg Patrick
Kowalkowski Jim
Kreczko Luke
Kuhr Thomas
Kutschke Robert
Kuznetsov Valentin
Lampl Walter
Lancon Eric
Lange David
Lassnig Mario
Laycock Paul
Leggett Charles
Letts James
Lewendel Birgit
Li Teng
Lima Guilherme
Linacre Jacob
Linden Tomas
Livny Miron
Lo Presti Giuseppe
Lopienski Sebastian
Love Peter
Lyon Adam
Magini Nicolò
Marshall Zachary L
Martelli Edoardo
Martin-Haugh Stewart
Mato Pere
Mazumdar Kajari
McCauley Thomas
McFayden Josh
McKee Shawn
McNab Andrew
Mehdiyev Rashid
Meinhard Helge
Menasce Dario
Mendez Lorenzo Patricia
Mete Alaettin Serhan
Michelotto Michele
Mitrevski Jovan
Moneta Lorenzo
Morgan Ben
Mount Richard
Moyse Edward
Murray Sean
Nairz Armin
Neubauer Mark S
Norman Andrew
Novaes Sérgio
Novak Mihaly
Oyanguren Arantza
Ozturk Nurcan
Pacheco Pages Andres
Paganini Michela
Pansanel Jerome
Pascuzzi Vincent R.
Patrick Glenn
Pearce Alex
Pearson Ben
Pedro Kevin
Perdue Gabriel
Perez-Calero Yzquierdo Antonio
Perrozzi Luca
Petersen Troels
Petric Marko
Petzold Andreas
Piedra Jónatan
Piilonen Leo
Piparo Danilo
Pivarski Jim
Pokorski Witold
Polci Francesco
Potamianos Karolos
Psihas Fernanda
Puig Navarro Albert
Quast Günter
Raven Gerhard
Reuter Jürgen
Ribon Alberto
Rinaldi Lorenzo
Ritter Martin
Robinson James
Rodrigues Eduardo
Roiser Stefan
Rousseau David
Roy Gareth
Rybkine Grigori
Sailer Andre
Sakuma Tai
Santana Renato
Sartirana Andrea
Schellman Heidi
Schovancová Jaroslava
Schramm Steven
Schulz Markus
Sciabà Andrea
Seidel Sally
Sekmen Sezen
Serfon Cedric
Severini Horst
Sexton-Kennedy Elizabeth
Seymour Michael
Sgalaberna Davide
Shapoval Illya
Shiers Jamie
Shiu Jing-Ge
Short Hannah
Siroli Gian Piero
Skipsey Sam
Smith Tim
Snyder Scott
Sokoloff Michael D
Spentzouris Panagiotis
Stadie Hartmut
Stark Giordon
Stewart Gordon
Stewart Graeme
Sánchez Arturo
Sánchez-Hernández Alberto
Taffard Anyes
Tamponi Umberto
Templon Jeff
Tenaglia Giacomo
Tsulaia Vakhtang
Tunnell Christopher
Vaandering Eric
Valassi Andrea
Vallecorsa Sofia
Valsan Liviu
Van Gemmeren Peter
Vernet Renaud
Viren Brett
Vlimant Jean-Roch
Voss Christian
Votava Margaret
Vuosalo Carl
Vázquez Sierra Carlos
Wartel Romain
Watts Gordon T.
Wenaus Torre
Wenzel Sandro
Williams Mike
Winklmeier Frank
Wissing Christoph
Wuerthwein Frank
Wynne Benjamin
Xiaomei Zhang
Yang Wei
Yazgan Efe
Publication venue
Publication date: 18/12/2017
Field of study

Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for the HL-LHC in particular, it is critical that all of the collaborating stakeholders agree on the software goals and priorities, and that the efforts complement each other. In this spirit, this white paper describes the R&D activities required to prepare for this software upgrade.Peer reviewe

Universidade do Minho: RepositoriUM

Hal - Université Grenoble Alpes

HAL Clermont Université

Helsingin yliopiston digitaalinen arkisto

HAL-CEA

Hal-Diderot

arXiv.org e-Print Archive

HAL-IN2P3